Building a Bilingual Representation of the Roget Thesaurus for French to English Machine Translation
نویسندگان
چکیده
This paper describes a solution to lexical transfer as a trade-off between a dictionary and an ontology. It shows its association to a translation tool based on morpho-syntactical parsing of the source language. It is based on the English Roget Thesaurus and its equivalent, the French Larousse Thesaurus, in a computational framework. Both thesaurii are transformed into vector spaces, and all monolingual entries are represented as vectors, with 1000 components for English and 873 for French. The indexing concepts of the respective thesaurii are the generation families of the vector spaces. A bilingual data structure transforms French entries into vectors in the English space, by using their equivalencies representations. Word sense disambiguation consists in choosing the appropriate vector among these ’bilingual’ vectors, by computing the contextualized vector of a given word in its source sentence, wading it in the English vector space, and computing the closest distance to the different entries in the bilingual data structure beginning with the same source string (i.e. French word). The process has been experimented on a 20, 000 words extract of a French novel, Le Petit Prince, and lexical transfer results were found quite encouraging with a recall of 86% and a precision of 71%.
منابع مشابه
Working with Russian Queries for the GIRT, Bilingual and Multilingual CLEF Tasks
For our activities within the CLEF 2001 evaluation, Berkeley group one participated in the bilingual, multilingual and GIRT tasks focussing on the use of Russian queries. Performance on the Russian queries !English documents bilingual task was excellent, comparable to performance using German queries. For the multilingual task we utilized English as a pivot language between Russian and German a...
متن کاملGIRT and the Use of Subject Metadata for Retrieval
The use of domain-specific metadata (subject keywords) is tested for monolingual and bilingual retrieval on the GIRT social science collection. A new technique, Entry Vocabulary Modules, which adds subject keywords selected from the controlled vocabulary to the query, has been tested. As in previous years, we compare our techniques of thesaurus matching and Entry Vocabulary Modules to simple ma...
متن کاملUC Berkeley at CLEF 2003 - Russian Language Experiments and Domain-Specific Cross-Language Retrieval
As in the previous years, Berkeley’s group 1 experimented with the domain-specific CLEF collection GIRT as well as with Russian as query and document language. The GIRT collection was substantially extended this year and we were able to improve our retrieval results for the query languages German, English and Russian. For the GIRT retrieval experiments, we utilized our previous experiences by c...
متن کاملGenerating Cross-lingual Concept Space from Parallel Corpora on the Web
The information available in languages other than English on the World Wide Web is increasing significantly. To cross language boundaries between different languages, dictionaries are the most typical tools. However, the general-purpose dictionary is less sensitive in genre and domain and it is impractical to manually construct tailored bilingual dictionaries or sophisticated multilingual thesa...
متن کاملTranslation Quality Assessment of English Equivalents of Persian Proper Nouns: A case of bilingual tourist signposts in Isfahan
Abstract This study evaluated the translation quality of English equivalents of Persian proper nouns in the tourist signs and bilingual boards in Isfahan. To find different errors in the translations of the bilingual boards and tourist signs, the data were collected directly by taking picture or writing exactly from the available tourist signs and bilingual boards. Then, the errors were assesse...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008